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1 S et-assoc i ative cache simulation usin g g eneralized binomial trees 
Rabin A. Sugumar, Santosh G. Abraham 

February 1995 ACM Transactions on Computer Systems (TOCS), volume 13 issue i 
Publisher: ACM Press 

Additional Information: full citation, abstract, .references, citings, inde x 
terms, review 



Full text available: 



Set-associative caches are widely used in CPU memory hierarchies, I/O subsystems, and 
file systems to reduce average access times. This article proposes an efficient simulation 
technique for simulating a group of set-associative caches in a single pass through the 
address trace, where all caches have the same line size but varying associativities and 
varying number of sets. The article also introduces a generalization of the ordinary 
binomial tree and presents a representation of caches in ... 

Keywords: all-associativity simulation, binomial tree, cache modeling, inclusion 
properties, set-associative caches, single-pass simulation, trace-driven simulation 



2 Pr ocessor m i croarchitecture I: Partitioned first-level cache desi gn f or c lu stered j 
mic roarchitecture s 
Paul Racunas, Yale N. Patt 

June 2003 Proceedings of the 17th annual international conference on 
Supercomputing 

Publisher: ACM Press 

Full text available- f|3 pdf(1 91 74 KB) Additional Information: full citation, abstract, references, citings, index 
l^j — terms 

The high clock frequencies of modern superscalar processors make the wire delay 
incurred in moving data across the processor chip a significant concern. As frequencies 
continue to increase, it will become more difficult for a centralized first level data cache to 
supply the timely data bandwidth required by superscalar processors.This paper presents 
a complete solution for the partitioning of the first level of the memory hierarchy. The first 
level data cache is split into several independent pa ... 

Keywords: clustered microarchitecture, partitioned cache 



Session 8C : T he set-associative cache performance of search trees 
James D. Fix 

January 2003 Proceedings of the fourteenth annual ACM-SIAM symposium on 
Discrete algorithms 

Publisher: Society for Industrial and Applied Mathematics 

Full text available: ^ pd f(787.50 KB ) Additional Information: full citation , abstract , references , index terms 



We consider the costs of access to data stored in search trees assuming that those 
memory accesses are managed with a cache. Our cache memory model is two-level, has 
a small degree of set-associativity, and uses LRU replacement, and we consider the 
number of cache misses that a set of accesses incurs. For standard tree access— searches 
and traversals— changing the degree of set-associativity has no effect on performance.To 
explain this, we develop general stochastic access models, an adaptation ... 

Cache memory performance in a unix enviroment 

Cedell Alexander, William Keshlear, Furrokh Cooper, Faye Briggs 

June 1986 ACM SIGARCH Computer Architecture News, volume 14 issue 3 

Publisher: ACM Press 

Full text available: ^ pdf(2.10 MB) Additional Information: full citation , citings. index terms 



On the inclusion properties for multi-level cache hierarchies 
J.-L Baer, W.-H. Wang 

May 1988 ACM SIGARCH Computer Architecture News , Proceedings of the 15th 
Annual International Symposium on Computer architecture ISCA '88, 

Volume 16 Issue 2 

Publisher: IEEE Computer Society Press, ACM Press 

Full text available- 1f3 pdf(886 24 KB) Additional information: full citation , abstract , references , citings , index 
~~ terms 

The inclusion property is essential in reducing the cache coherence complexity for 
multiprocessors with multilevel cache hierarchies. We give some necessary and sufficient 
conditions for imposing the inclusion property for fully- and set-associative caches which 
allow different block sizes at different levels of the hierarchy. Three multiprocessor 
structures with a two-level cache hierarchy (single cache extension, multiport second- 
level cache, bus-based) are examined. The feasibility of im ... 



T owards a theory o f cache-efficient al gorit hms 
Sandeep Sen, Siddhartha Chatterjee, Neeraj Dumir 
November 2002 Journal of the ACM (JACM), volume 49 issue 6 
Publisher: ACM Press 

Full text available: ^ pdf(273 41 KB) Additional Information: full citation, abstract, references, index terms 

We present a model that enables us to analyze the running time of an algorithm on a 
computer with a memory hierarchy with limited associativity, in terms of various cache 
parameters. Our cache model, an extension of Aggarwal and Vitter's I/O model, enables 
us to establish useful relationships between the cache complexity and the I/O complexity 
of computations. As a corollary, we obtain cache-efficient algorithms in the single-level 
cache model for fundamental problems like sorting, FFT, and an i ... 

Keywords: Hierarchical memory, I/O complexity, lower bound 



The V-Way Cache: Dema nd Based Associativity via Global Replacement 
Moinuddin K. Qureshi, David Thompson, Yale N. Patt 

May 2005 ACM SIGARCH Computer Architecture News , Proceedings of the 32nd 
Annual International Symposium on Computer Architecture ISCA '05, 

Volume 33 Issue 2 
Publisher: IEEE Computer Society, ACM Press 

Full text available: ^gpdf (231.93 KB) Additional Information: full citatio n, abstract, i ndex terms 

As processor speeds increase and memory latency becomes more critical, intelligent 
design and management of secondary caches becomes increasingly important. The 
efficiency of current set-associative caches is reduced because programs exhibit a non- 
uniform distribution of memory accesses across different cache sets. We propose a 
technique to vary the associativity of a cache on a per-set basis in response to the 
demands of the program. By increasing the number of tag-store entries relative to the ... 



On the inclus i on pro perties for multi-level cache hierarchies 



Jean-Loup Baer, Wen-Hann Wang 

August 1998 25 years of the international symposia on Computer architecture 
(selected papers) 

Publisher: ACM Press 

Full text available: ^ pdf (876.77 KB ) Additional Information: full citation , references, citin gs, index terms 



Cache optimization for embedded processor cores: An analytical approach 
Arijit Ghosh, Tony Givargis 

October 2004 ACM Transactions on Design Automation of Electronic Systems 

(TODAES), Volume 9 Issue 4 
Publisher: ACM Press 

Full text available: ^g) pdf(236.72 KB ) Additional Information: full citatio n, abstract , references, inde x ter m s 

Embedded microprocessor cores are increasingly being used in embedded and mobile 
devices. The software running on these embedded microprocessor cores is often a priori 
known; thus, there is an opportunity for customizing the cache subsystem for improved 
performance. In this work, we propose an efficient algorithm to directly compute cache 
parameters satisfying desired performance criteria. Our approach avoids simulation and 
exhaustive exploration, and, instead, relies on an exact algorithmic ... 

Keywords: Cache optimization, core-based design, design space exploration, system-on- 
a-chip 



Decoupled sectored caches: conciliating low tag implementation cost 
A. Seznec 

April 1994 ACM SIGARCH Computer Architecture News , Proceedings of the 21ST 

annual international symposium on Computer architecture ISCA '94, volume 

22 Issue 2 

Publisher: IEEE Computer Society Press, ACM Press 

Full text available: If) pdf_(1 06 MB) Additional Information: full citation, abst ract , references, ci tin gs, index 
. |>j terms 

Sectored caches have been used for many years in order to reconcile low tag array size 
and small or medium block size. In a sectored cache, a single address tag is associated 
with a sector consisting on several cache lines, while validity, dirty and coherency tags 
are associated with each of the inner cache lines. Maintaining a low tag array size is a 
major issue in many cache designs (e.g. L2 caches). Using a sectored cache is a design 
trade-off between a low size of the tag array which is possi ... 

Tradeoff s in two-level o n-chip caching 
N. P. Jouppi, S. J. E. Wilton 

April 1994 ACM SIGARCH Computer Architecture News , Proceedings of the 21ST 

annual international symposium on Computer architecture ISCA '94, volume 

22 Issue 2 

Publisher: IEEE Computer Society Press, ACM Press 

Full text available: pdftl 16 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

The performance of two-level on-chip caching is investigated for a range of technology 
and architecture assumptions. The area and access time of each level of cache is modeled 
in detail. The results indicate that for most workloads, two-level cache configurations 
(with a set-associative second level) perform marginally better than single-level cache 
configurations that require the same chip area once the first-level cache sizes are 64KB or 
larger. Two-level configurations become even more import ... 

The pool of subsectors ca che desi gn 
Jeffrey B. Rothman, Alan Jay Smith 

May 1999 Proceedings of the 13th international conference on Supercomputing 

Publisher: ACM Press 

Full text available: pdf( 1.69 MB ) Additional Information: f ull citation , references , citings, index te rms 



13 Cache p erformance of operatin g s ystem and multi pr ogr ammin g workloads 
^ Anant Agarwal, John Hennessy, Mark Horowitz 

>S November 1988 ACM Transactions on Computer Systems (TOCS), volume 6 issue 4 
Publisher: ACM Press 

Full text available- IS pdf( 3 16 MB) Additional Information: full ci tation , abstract, references , citings, index 

terms , review 

Large caches are necessary in current high-performance computer systems to provide the 
required high memory bandwidth. Because a small decrease in cache performance can 
result in significant system performance degradation, accurately characterizing the 
performance of large caches is important. Although measurements on actual systems 
have shown that operating systems and multiprogramming can affect cache performance, 
previous studies have not focused on these effects. We have developed a pro ... 

14 Trace-driven simulations for a two-level cache design in open bus systems 
Hakon O. Bugge, Ernst H. Kristiansen, Bjorn O. Bakka 

May 1990 ACM SIGARCH Computer Architecture News , Proceedings of the 17th 

annual international symposium on Computer Architecture ISCA '90, volume 

18 Issue 3a 
Publisher: ACM Press 

Full text available* Wl pdf(1 20 MB) Additional Information: full citation , abstract , references , citings, index 
' ™ " " terms 

Two-level cache hierarchies will be a design issue in future high-performance CPUs. In 
this paper we evaluate various metrics for data cache* designs. We discuss both one- and 
two-level cache hierarchies. Our target is a new 100+ mips CPU, but the methods are 
applicable to any cache design. The basis of our work is a new trace-driven, multiprocess 
cache simulator. The simulator incorporates a simple priority-based scheduler which 
controls the execution ... 

1 5 A case for two-way skewed-as soc iat ive caches 
Andre Seznec 

May 1993 ACM SIGARCH Computer Architecture News , Proceedings of the 20th 

annual international symposium on Computer architecture ISCA '93, volume 

21 Issue 2 

Publisher: ACM Press 

Full text available:^) pdf( 975.2Q KB ) Additional Information: full cit ation, references , citing s, index terms 



1 6 Inexpensive implementations of set-associativity 
R. E. Kessier, R. Jooss, A. Lebeck, M. D. Hill 

April 1989 ACM SIGARCH Computer Architecture News , Proceedings of the 16th 

annual international symposium on Computer architecture ISCA '89, volume 
17 Issue 3 
Publisher: ACM Press 

Full text available- 1p pdf(1 16 MB) Additional Information: full citation , abstract , references , citings, index 

terms 

The traditional approach to implementing wide set-associativity is expensive, requiring a 
wide tag memory (directory) and many comparators. Here we examine alternative 
implementations of associativity that use hardware similar to that used to implement a 
direct-mapped cache. One approach scans tags serially from most-recently used to least- 
recently used. Another uses a partial compare of a few bits from each tag to reduce the 
number of tags that must be examined serially. The drawback of bo ... 

17 Use-Based Register Caching with Decoupled Indexing 
J. Adam Butts, Gurindar S. Sohi 

March 2004 ACM SIGARCH Computer Architecture News , Proceedings of the 31st 
annual international symposium on Computer architecture ISCA '04, 

Volume 32 Issue 2 
Publisher: IEEE Computer Society, ACM Press 



Full text available: g|pdf(1 82.25 KB) Additional Information: full citation , abstract 



Wide, deep pipelines need many physical registersto hold the results of in-flight 
instructions. Simultaneously,high clock frequencies prohibit using largeregister files and 
bypass networks without a significantperformance penalty. Previously proposed 
techniquesusing register caching to reduce this penalty sufferfrom several problems 
including poor insertion andreplacement decisions and the need for a fully- 
associativecache for good performance. We present novelmechanisms for managing and 
indexin ... 

Cache Optimization For Embedded Processor Cores: An Anal y tical A p proach 
Arijit Ghosh, Tony Givargis 

November 2003 Proceedings of the 2003 IEEE/ ACM international conference on 
Computer-aided design 

Publisher: IEEE Computer Society 

Full text available: ^ pdf(141.16 KB) Additional Information: full citation , abstract , index terms 

Embedded microprocessor cores are increasingly beingused in embedded and mobile 
devices. The softwarerunning on these embedded microprocessor cores is often apriori 
known, thus, there is an opportunity for customizingthe cache subsystem for improved 
performance. In thiswork, we propose an efficient algorithm to directly computecache 
parameters satisfying desired performance criteria. Our approach avoids simulation and 
exhaustiveexploration, and, instead, relies on an exact algorithmicapproach. We ... 

Keywords: Cache Optimization, Core-Based Design, Design Space Exploration, System- 
on-a-Chip 



19 System-level power optimization: techniques and tools 
Luca Benini, Giovanni de Micheli 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 5 Issue 2 
Publisher: ACM Press 

Full text available* HH pdf(385 22 KB) Additional Information: full citation, abstract, r efe ren ce s, ci ti ngs, index 
1 terms 

This tutorial surveys design methods for energy-efficient system-level design. We 
consider electronic sytems consisting of a hardware platform and software layers. We 
consider the three major constituents of hardware that consume energy, namely 
computation, communication, and storage units, and we review methods of reducing their 
energy consumption. We also study models for analyzing the energy cost of software, and 
methods for energy-efficient software design and compilation. This survery ... 

20 Cache: Enhancing data cache reli a bility by the addition of a small fully-associative 
replication cache 

^ Wei Zhang 

June 2004 Proceedings of the 18th annual international conference on 
Supercomputing 

Publisher: ACM Press 

Full text available: ^ pdf(265 07 KB) Additional Information: full citation, abs tract, referen ces, index ter ms 

Soft error conscious cache design is a necessity for reliable computing. ECC or parity- 
based integrity checking technique in use today either compromises performance for 
reliability or vice versa, and the N modular redundancy (NMR) scheme is too costly for 
microprocessors and applications with stringent cost constraint. This paper proposes a 
novel and cost-effective solution to enhance data reliability with minimum impact on 
performance. The idea is to add a small fully-associative cache to stor ... 

Keywords: in-cache replication, soft error, write-back cache 
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